Outline

This HTML document contains the output of CATS-rb transcriptome assembly comparison tool. For more details on each table and figure, refer to the tool’s documentation.

General transcriptome assembly statistics

Table 1. General transcriptome assembly statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N transcripts 35722 27748 30651 36610 18601 18652 19217
Total transcriptome length (bp) 92747399 34167218 32433743 27462499 44994186 44145594 42094487
GC content (%) 49.14% 49.17% 49.34% 49.68% 48.43% 48.45% 48.60%
N, (%) transcripts with length higher or equal to 200 bp 34155, (95.61%) 23903, 86.14% 25617, 83.58% 28233, 77.12% 18277, 98.26% 18225, 97.71% 18132, 94.35%
N, (%) transcripts with length higher or equal to 500 bp 32238, (90.25%) 14553, 52.45% 13923, 45.42% 11423, 31.2% 16878, 90.74% 16738, 89.74% 16335, 85%
N, (%) transcripts with length higher or equal to 1000 bp 26475, (74.11%) 9887, 35.63% 8846, 28.86% 6486, 17.72% 13488, 72.51% 13356, 71.61% 12875, 67%
N, (%) transcripts with length higher or equal to 5000 bp 4248, (11.89%) 1036, 3.73% 914, 2.98% 608, 1.66% 2003, 10.77% 1931, 10.35% 1766, 9.19%
N, (%) transcripts with length higher or equal to 10000 bp 620, (1.74%) 93, 0.34% 90, 0.29% 59, 0.16% 228, 1.23% 221, 1.18% 202, 1.05%
N, (%) transcripts with length higher or equal to 20000 bp 61, (0.17%) 4, 0.01% 6, 0.02% 6, 0.02% 10, 0.05% 10, 0.05% 6, 0.03%
Mean transcript length (bp) 2596.37 1231.34 1058.16 750.14 2418.91 2366.8 2190.48
Median trancript length (bp) 1879 547.5 434 310 1756 1714 1575
Transcript length IQR (bp) 970-3370 258-1570 234-1238 206-641 926-3168 899-3096 762-2878
Trancript length range (bp) 18-71382 131-60159 131-58349 131-45657 131-50033 131-50033 131-49970
N50 (bp) 3836 2597 2388 1831 3590 3527 3401
L50 5542 3745 3737 3789 3836 3816 3748
N90 (bp) 1345 477 372 249 1215 1194 1135
L90 26715 14985 16939 22581 12212 12182 12096
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N transcripts 18641 18719 19432
Total transcriptome length (bp) 45586004 45065633 43400943
GC content (%) 48.37% 48.39% 48.52%
N, (%) transcripts with length higher or equal to 200 bp 18371, 98.55% 18312, 97.83% 18210, 93.71%
N, (%) transcripts with length higher or equal to 500 bp 17019, 91.3% 16886, 90.21% 16573, 85.29%
N, (%) transcripts with length higher or equal to 1000 bp 13620, 73.06% 13517, 72.21% 13131, 67.57%
N, (%) transcripts with length higher or equal to 5000 bp 2024, 10.86% 1987, 10.61% 1872, 9.63%
N, (%) transcripts with length higher or equal to 10000 bp 234, 1.26% 230, 1.23% 225, 1.16%
N, (%) transcripts with length higher or equal to 20000 bp 10, 0.05% 12, 0.06% 11, 0.06%
Mean transcript length (bp) 2445.47 2407.48 2233.48
Median trancript length (bp) 1778 1749 1605
Transcript length IQR (bp) 944-3211 914-3153.5 776.75-2950
Trancript length range (bp) 131-50248 132-50317 132-50173
N50 (bp) 3602 3576 3485
L50 3876 3847 3794
N90 (bp) 1224 1214 1160
L90 12283 12234 12208

IQR = interquartile range

Figure 1. Transcript length distribution.

Transcriptome assembly mapping analysis

Table 2. Transcriptome assembly mapping statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N, % unmapped transcripts 499, 1.4% 7, 0.03% 2, 0.01% 4, 0.01% 3, 0.02% 2, 0.01% 4, 0.02%
Transcript alignment proportion (mean, IQR) 1, 1-1 0.98, 1-1 0.98, 1-1 0.97, 1-1 0.99, 1-1 0.99, 1-1 0.99, 1-1
N, % multimapped transcripts 1029, 2.88% 136, 0.73% 216, 0.7% 188, 0.68% 152, 0.82% 153, 0.8% 147, 0.79%
N, % structural inconsistent transcripts 547, 1.53% 1622, 5.85% 1997, 6.52% 2842, 7.76% 342, 1.84% 385, 2.06% 696, 3.62%
N exons 77321 76835 72591 189645 83914 84838 79302
N exons per transcript (mean, IQR) 5.38, 2-7 2.81, 1-3 2.54, 1-3 2.08, 1-2 4.54, 2-6 4.46, 2-6 4.26, 2-6
Exon length (bp) (mean, IQR) 488.33, 145-555 427.36, 148-478 406.46, 149-450 360.03, 150-388 528.65, 152-612 526.65, 152-610 521.59, 152-603
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N, % unmapped transcripts 0, 0.00% 2, 0.01% 9, 0.05%
Transcript alignment proportion (mean, IQR) 0.99, 1-1 0.99, 1-1 0.99, 1-1
N, % multimapped transcripts 153, 0.82% 162, 0.83% 249, 0.68%
N, % structural inconsistent transcripts 301, 1.61% 390, 2.08% 511, 2.63%
N exons 81074 82706 84058
N exons per transcript (mean, IQR) 4.57, 2-6 4.52, 2-6 4.37, 2-6
Exon length (bp) (mean, IQR) 531.14, 153-613 529.81, 153-612 526.2, 153-610

IQR = interquartile range

Figure 2. Transcript alignment proportion category distribution.

Figure 3. Number of exons per transcript category distribution.

Figure 4. Exon length distribution.

Exon set analysis

Table 3. Exon set statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N exon sets 64093 63265 63560 62016 60365 60271 59789
Exon set length (bp) (mean, IQR) 559.41, 156-652 448.88, 153-514 425.07, 153-485 379.64, 153-421 558.64, 160-661 556.2, 160-658 550.61, 159-651
N, % exon sets included in completeness analyses 64093, 100% 63265, 100% 63560, 100% 62016, 100% 60365, 100% 60271, 100% 59789, 100%
N, % unique exon sets 2621, 4.09% 23, 0.04% 19, 0.03% 32, 0.05% 9, 0.01% 7, 0.01% 11, 0.02%
Unique exon set length (bp) (mean, IQR) 148.61, 69-152 537, 187.5-497 339.42, 169-464 276.88, 159-299.25 757.67, 181-521 368.57, 256-537 512.36, 158-476.5
N missing exon sets found in any transcriptome assembly 2635 92379 107392 147822 42656 45259 53532
N missing exon sets found in all other transcriptome assemblies 65 7872 12637 46703 512 673 1877
Common exon set length (bp) (mean, IQR) 698.57, 192-858 547.82, 175-655 514.03, 170-608 439.6, 158-504 674.74, 189-831 671.05, 189-828 660.14, 187-815
Relative common exon set length (mean, IQR) 1, 1-1 0.87, 0.79-1 0.84, 0.7-1 0.76, 0.51-1 0.98, 1-1 0.97, 1-1 0.96, 1-1
Relative exon score 0.995 0.714 0.671 0.57 0.894 0.887 0.864
N missing exon sets inside transcript sets 2593 41092 42522 43590 26926 28032 31207
N missing exon sets outside transcript sets 1 5095 7985 20138 357 453 909
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N exon sets 60434 60320 60019
Exon set length (bp) (mean, IQR) 561.05, 161-663 559.62, 160-661 555.66, 160-658
N, % exon sets included in completeness analyses 60434, 100% 60320, 100% 60019, 100%
N, % unique exon sets 8, 0.01% 3, 0% 22, 0.04%
Unique exon set length (bp) (mean, IQR) 1503.38, 355-1313.75 188.67, 112.5-248.5 684.45, 170-529.5
N missing exon sets found in any transcriptome assembly 40918 43055 48555
N missing exon sets found in all other transcriptome assemblies 294 535 1046
Common exon set length (bp) (mean, IQR) 677.75, 190-835 675.04, 189-831 667.28, 188-825
Relative common exon set length (mean, IQR) 0.98, 1-1 0.98, 1-1 0.97, 1-1
Relative exon score 0.9 0.894 0.879
N missing exon sets inside transcript sets 25978 27081 29373
N missing exon sets outside transcript sets 314 360 644

IQR = interquartile range

Figure 5. Exon set length distribution.

Figure 6. Exon set genomic distribution.

Exon set UpSet plot Figure 7. Exon set UpSet plot.

Figure 8. Common exon set length distribution.

Figure 9. Common exon set relative length category distribution.

Figure 10. Unique exon set length distribution.

Figure 11. Exon set pairwise completeness similarity.

Pairwise exon set Venn diagrams Figure 12. Pairwise exon set Venn diagrams.

Figure 13. Exon set hierarchical clustering heatmap.

Transcript set analysis

Table 4 Transcript set statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N transcript sets 12056 19948 22201 26626 12286 12441 12824
Transcript set length (bp) (mean, IQR) 7414.43, 1046.75-5999 3766.79, 315-2322 3236.22, 274-1839 2338.12, 227-1059 6935.85, 1079-5587.75 6831.46, 1052-5389 6504.71, 995-5018.5
N isoforms per transcript set (mean, IQR) 2.92, 1-3 1.38, 1-1 1.36, 1-1 1.31, 1-1 1.51, 1-1 1.49, 1-1 1.45, 1-1
N, % transcript sets included in completeness analyses 11951, 99.13% 19913, 99.82% 22152, 99.78% 26417, 99.22% 12278, 99.93% 12436, 99.96% 12810, 99.89%
N, % unique transcript sets 134, 1.12% 9, 0.05% 7, 0.03% 16, 0.06% 3, 0.02% 2, 0.02% 6, 0.05%
Unique transcript set length (bp) (mean, IQR) 835.46, 192-872 1065.56, 184-540 266.86, 166-199.5 219.69, 154.75-257.5 514.33, 506.5-525.5 362, 264-460 460.17, 163.5-502.75
N missing transcript sets found in any transcriptome assembly 1471 8518 10192 14651 4370 4502 4618
N missing transcript sets found in all other transcriptome assemblies 13 832 1534 5798 65 65 195
Common transcript set length (bp) (mean, IQR) 8620.99, 1395-7172 6682.68, 893-5155 6268.57, 749-4698 5241.27, 496-3729 8089.64, 1377-6614 8050.52, 1377-6525 7842.08, 1346-6306
Relative common transcript set length (mean, IQR) 1, 1-1 0.74, 0.55-0.96 0.68, 0.46-0.92 0.53, 0.28-0.8 0.96, 0.97-1 0.95, 0.96-1 0.93, 0.93-1
Relative transcript score 0.986 0.691 0.626 0.485 0.933 0.927 0.906
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N transcript sets 12172 12265 12541
Transcript set length (bp) (mean, IQR) 7041.55, 1093-5692 6975.79, 1082-5590 6740.52, 1049-5276
N isoforms per transcript set (mean, IQR) 1.53, 1-1 1.52, 1-1 1.48, 1-1
N, % transcript sets included in completeness analyses 12167, 99.96% 12262, 99.98% 12531, 99.92%
N, % unique transcript sets 2, 0.02% 2, 0.02% 11, 0.09%
Unique transcript set length (bp) (mean, IQR) 1381, 1128.5-1633.5 248.5, 202.25-294.75 3023.91, 297.5-1128.5
N missing transcript sets found in any transcriptome assembly 4307 4436 4319
N missing transcript sets found in all other transcriptome assemblies 27 39 130
Common transcript set length (bp) (mean, IQR) 8154.03, 1386-6680 8122.05, 1379-6619 7979.94, 1371-6464
Relative common transcript set length (mean, IQR) 0.96, 0.98-1 0.96, 0.97-1 0.94, 0.95-1
Relative transcript score 0.939 0.933 0.921

IQR = interquartile range

Figure 14. Transcript set length distribution.

Figure 15. Number of isoforms per transcript set category distribution.

Transcript set UpSet plot Figure 16. Transcript set UpSet plot

Figure 17. Common transcript set length distribution.

Figure 18. Common transcript set relative length category distribution.

Figure 19. Unique transcript set length distribution.

Figure 20. Transcript set pairwise completeness similarity.

Pairwise transcript set Venn diagrams Figure 21. Pairwise transcript set Venn diagrams.

Figure 22. Transcript set hierarchical clustering heatmap.

Figure 23. Unique exon set position in non-origin transcriptomes.

Figure 24. Missing exon set position.

Annotation-based analysis

Table 5. Annotation-based statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N, % exon sets included in completeness analyses 64093, 100% 63265, 100% 63560, 100% 62016, 100% 60365, 100% 60271, 100% 59789, 100%
N, % matched transcriptome assembly exon sets (exon set precision) 64067, 99.96% 63210, 99.91% 63521, 99.94% 61966, 99.92% 60341, 99.96% 60240, 99.95% 59768, 99.96%
N, % matched GTF exon sets (exon set recall) 64067, 98.78% 50950, 78.56% 48427, 74.67% 41823, 64.49% 59176, 91.24% 58870, 90.77% 57811, 89.14%
Proprtion of covered transcriptome assembly exon sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1
Annotation-based exon score 0.988 0.707 0.665 0.565 0.885 0.877 0.855
N, % transcript sets included in completeness analyses 11951, 99.13% 19913, 99.82% 22152, 99.78% 26417, 99.22% 12278, 99.93% 12436, 99.96% 12810, 99.89%
N, % matched transcriptome assembly transcript sets (transcript set precision) 11924, 99.77% 19876, 99.81% 22111, 99.81% 26381, 99.86% 12255, 99.81% 12410, 99.79% 12785, 99.8%
N, % matched GTF transcript sets (transcript set recall) 11864, 97.53% 9988, 82.11% 9229, 75.87% 7235, 59.48% 11544, 94.9% 11520, 94.71% 11467, 94.27%
Proprtion of covered transcriptome assembly transcript sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1
Annotation-based transcript score 0.974 0.677 0.615 0.475 0.917 0.91 0.888
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N, % exon sets included in completeness analyses 60434, 100% 60320, 100% 60019, 100%
N, % matched transcriptome assembly exon sets (exon set precision) 60406, 99.95% 60304, 99.97% 59979, 99.93%
N, % matched GTF exon sets (exon set recall) 59430, 91.63% 59187, 91.26% 58463, 90.14%
Proprtion of covered transcriptome assembly exon sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1
Annotation-based exon score 0.891 0.885 0.87
N, % transcript sets included in completeness analyses 12167, 99.96% 12262, 99.98% 12531, 99.92%
N, % matched transcriptome assembly transcript sets (transcript set precision) 12146, 99.83% 12243, 99.85% 12495, 99.71%
N, % matched GTF transcript sets (transcript set recall) 11555, 94.99% 11570, 95.12% 11546, 94.92%
Proprtion of covered transcriptome assembly transcript sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1
Annotation-based transcript score 0.922 0.919 0.905

IQR = interquartile range

Figure 25. Proprtion of covered transcriptome exon sets by a GTF exon set category distribution.

Annotation-based exon set UpSet plot Figure 26. Annotation-based exon set UpSet plot.

Annotation-based pairwise exon set Venn diagrams Figure 27. Annotation-based pairwise exon set Venn diagrams.

Figure 28. Annotation-based exon set hierarchical clustering heatmap.

Figure 29. Proportion of covered transcriptome transcript sets by a GTF transcript set category distribution.

Annotation-based transcript set UpSet plot Figure 30. Annotation-based transcript set UpSet plot.

Annotation-based pairwise transcript set Venn diagrams Figure 31. Annotation-based pairwise transcript set Venn diagrams.

Figure 32. Annotation-based transcript set hierarchical clustering heatmap.